Understanding Impartial Versus Utility-Driven Quality Assessment In Large Datasets

نویسندگان

  • Adir Even
  • Ganesan Shankaranarayanan
چکیده

Establishing and sustaining very high data quality in complex data environments is expensive and often practically impossible. Quantitative assessments of quality can provide important inputs for prioritizing improvement efforts. This study explores a methodology that evaluates both impartial and utility-driven assessments of data quality. Impartial assessments evaluate and measure the extent to which data is defective. Utility-driven assessments measure the extent to which the presence of quality defects degrades utility of that data, within a specific context of usage. The quality assessment methodology is empirically assessed using real-life alumni data – a large data resource that supports managing alumni relations and initiating pledge campaigns. The results provide important inputs that can direct the implementation and management of quality improvement policies in this data repository.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impartial versus Utility-driven Assessment of Data Quality: Methodology, Insights, and Implications for Managing Customer Data

This study presents a methodology for dual assessments of data. Impartial assessment measures the extent to which data is defective. Utility-driven assessments of data quality measure the extent to which the presence of quality defects degrades data utility – the benefit gained from using that data in a specific business setting. The dual assessment methodology is demonstrated in a real-world s...

متن کامل

Metrics-Driven Framework for LOD Quality Assessment

The main objective of the Linked Open Data paradigm is to crystallize knowledge through the interlinking of already existing but dispersed data. The usefulness of the developed knowledge depends strongly on the quality of the aggregated and published data. Researchers have observed many challenges with the quality of Linked Open Data; therefore, our main objective in this thesis is to propose a...

متن کامل

Assessing and Refining Mappings to RDF to Improve Dataset Quality

rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually –but rarely– applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be genera...

متن کامل

Chapter 2: Data-Driven View of Disease Biology

Modern experimental strategies often generate genome-scale measurements of human tissues or cell lines in various physiological states. Investigators often use these datasets individually to help elucidate molecular mechanisms of human diseases. Here we discuss approaches that effectively weight and integrate hundreds of heterogeneous datasets to gene-gene networks that focus on a specific proc...

متن کامل

Application of a Cost-Driven Optimization Method in Beer Brewing Process

The final quality and cost of a manufactured product are determined to a large extent by the engineering design of the product and its production process through activities of off-line quality control methods, namely, System Design, Parameter Design and Tolerance Design. However, in the context of most non-industrialized countries, the off-line quality activities of product design and system de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007